28 research outputs found

    Composable Deep Reinforcement Learning for Robotic Manipulation

    Full text link
    Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies

    Quality Diversity for Multi-task Optimization

    Get PDF
    International audienceQuality Diversity (QD) algorithms are a recent family of optimization algorithms that search for a large set of diverse but high-performing solutions. In some specific situations, they can solve multiple tasks at once. For instance, they can find the joint positions required for a robotic arm to reach a set of points, which can also be solved by running a classic optimizer for each target point. However, they cannot solve multiple tasks when the fitness needs to be evaluated independently for each task (e.g., optimizing policies to grasp many different objects). In this paper, we propose an extension of the MAP-Elites algorithm, called Multi-task MAP-Elites, that solves multiple tasks when the fitness function depends on the task. We evaluate it on a simulated parameterized planar arm (10-dimensional search space; 5000 tasks) and on a simulated 6-legged robot with legs of different lengths (36-dimensional search space; 2000 tasks). The results show that in both cases our algorithm outperforms the optimization of each task separately with the CMA-ES algorithm

    Real-time Flexibility Feedback for Closed-loop Aggregator and System Operator Coordination

    Get PDF
    Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. However, to be used effectively, aggregators must be able to communicate the available flexibility of the loads they control to the system operator in a manner that is both (i) concise enough to be scalable to aggregators governing hundreds or even thousands of loads and (ii) informative enough to allow the system operator to send control signals to the aggregator that lead to optimization of system-level objectives, such as cost minimization, and do not violate private constraints of the loads, such as satisfying specific load demands. In this paper, we present the design of a real-time flexibility feedback signal based on maximization of entropy. The design provides a concise and informative signal that can be used by the system operator to perform online cost minimization and real-time capacity estimation, while provably satisfying the private constraints of the loads. In addition to deriving analytic properties of the design, we illustrate the effectiveness of the design using a dataset from an adaptive electric vehicle charging network.Comment: The Eleventh ACM International Conference on Future Energy Systems (e-Energy'20

    Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games

    Get PDF
    While traditionally a labour intensive task, the testing of game content is progressively becoming more automated. Among the many directions in which this automation is taking shape, automatic play-testing is one of the most promising thanks also to advancements of many supervised and reinforcement learning (RL) algorithms. However these type of algorithms, while extremely powerful, often suffer in production environments due to issues with reliability and transparency in their training and usage. In this research work we are investigating and evaluating strategies to apply the popular RL method Proximal Policy Optimization (PPO) in a casual mobile puzzle game with a specific focus on improving its reliability in training and generalization during game playing. We have implemented and tested a number of different strategies against a real-world mobile puzzle game (Lily's Garden from Tactile Games). We isolated the conditions that lead to a failure in either training or generalization during testing and we identified a few strategies to ensure a more stable behaviour of the algorithm in this game genre.Comment: 10 pages, 8 figures, to be published in 2020 Foundations of Digital Games conferenc
    corecore